Simple and Effective Dimensionality Reduction for Word Embeddings
نویسنده
چکیده
Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Recently, there has been an emphasis on further improving the pre-trained word vectors through post-processing algorithms. One such area of improvement is the dimensionality reduction of word embeddings. Reducing the size of word embeddings through dimensionality reduction can improve their utility in memory constrained devices. In this work, we devise an algorithm that effectively combines PCA based dimensionality reduction with a post-processing algorithm, to construct word embeddings of lower dimensions. Empirical evalutions on 12 standard word similarity benchmarks show that our algorithm reduces the embedding dimensionality by 50%, while achieving similar or (more often) better performance.
منابع مشابه
Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component Analysis
Unsupervised learned representations of polysemous words generate a large of pseudo multi senses since unsupervised methods are overly sensitive to contextual variations. In this paper, we address the pseudo multi-sense detection for word embeddings by dimensionality reduction of sense pairs. We propose a novel principal analysis method, termed ExRPCA, designed to detect both pseudo multi sense...
متن کاملWord Re-Embedding via Manifold Dimensionality Retention
Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words cooccurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality....
متن کاملThe Role of Context Types and Dimensionality in Learning Word Embeddings
We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings ...
متن کاملExponential Family Embeddings
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market basket analysis, and ratings data from a mov...
متن کاملWord, graph and manifold embedding from Markov processes Author=Tatsunori Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola
Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitivepsychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric rec...
متن کامل